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Abstract —In this paper, we consider heterogeneous distributed 
storage systems (DSSs) having flexible reconstruction degree, 
where each node in the system has dynamic repair bandwidth 
and dynamic storage capacity. In particular, a data collector 
can reconstruct the file at time t using some arbitrary nodes 
in the system and for an arbitrary node failure the system can 
be repaired by some set of arbitrary nodes. Using min-cut bound, 
we investigate the fundamental tradeoff between storage and 
repair cost for our model of heterogeneous DSS. In particular, 
the problem is formulated as bi-objective optimization hnear 
programing problem. For an arbitrary DSS, it is shown that 
the calculated min-cut bound is tight. 

I. Introduction 

Cloud storage is a distributed storage system (DSS) in 
which information is stored on distinct nodes as encoded 
packets in a redundant manner. One can retrieve the file by 
contacting certain nodes in the system. In case of node failure, 
it can be repaired using other nodes in the system. For such 
DSSs, one has to optimize various parameters in the system 
such as storage capacity, repair bandwidth, availability, relia¬ 
bility, security and scalability. Such DSSs are used by many 
commercial systems like Facebook, Yahoo, IBM, Amazon and 
Microsoft Windows Azure system|[T]-@l . 

In homogeneous DSSs (where each node has same storage 
capacity and same repair degree) Q, encoded data packets of 
a file with size B are distributed among n nodes (each having 
storage capacity a) such that connecting any k{< n) nodes, 
one can retrieve the whole file. In the case of any arbitrary node 
failure, system is repaired by downloading /3 packets from any 
d{< n) nodes, called helper nodes jS). In these systems, one 
can provide reliability by simply replicating or encoding the 
massage data packets. In the case of simple replication, storage 
minimization is inefficient. On the other hand, encoding of 
data packets using erasure MDS (maximum distance separable) 
codes leads to inefficiecy for bandwidth minimization during 
node repair process. To optimize these conflicting parameters, 
in a seminal work Dimakis et. al a introduced regenerating 
codes. In 0 El, tradeoff between storage capacity a and 
repair bandwidth dfi is analyzed by plotting tradeoff curve 
for regenerating code. All points on the tradeoff curve can be 
obtained by linear network codes over finite fields 0 0. In 
the tradeoff curve, by minimizing both parameters in different 
order. Minimum Bandwidth Regenerating (MBR) codes and 
Minimum Storage Regenerating (MSR) codes are obtained 
0. Tradeoff between storage and repair bandwidth for exact- 
repair is studied in ifToi. In im, Shah et al calculated cut¬ 
set lower bound on repair bandwidth for a special flexible 



Fig. 1. A model of considered heterogeneous DSS is given here. In the hetero¬ 
geneous DSS each node has flexible storage capacity oli [i E {1, 2,..., n}) 
and repair bandwidth. In the system, at time t (in paiTicular, t E {^ 1 ,^ 2 }) 
flexible reconstruction degree for a data collector, is kt. Repair degree for an 
arbitrary node failure is also dynamic with respect to time. At time t, a failed 
node Ui is repaired by some nodes. 

setting for homogeneous DSS. In a nice survey nil, an 
overview of some existing results and repair models on DSS 
are explored. Recently in iflTll . the tradeoff between storage 
capacity and repair bandwidth is investigated for exact repair 
linear regenerating codes for k = d = n — 1. 

Heterogeneous DSSs are more close to real world scenarios 
where characterization of all storage nodes in various aspects 
are not necessarily uniform due to geographical environment 
and storage devices cost etc. 

Many such heterogeneous DSS have been studied recently 
GMIll- In IIT8I - I20I . storage allocation problem is investigated 
to maximizes the probability of successful recovery. For het¬ 
erogeneous DSS, ED proved that repair cost can be reduced 
by allowing helper nodes to encode the codewords of other 
nodes. In iflBl . Akhlaghi et al investigated the tradeoff between 
storage capacity and repair bandwidth for the generalized 
regenerating codes and shown that each point on curve is 
achievable. In the generalized regenerating code, set of all 
nodes is divided into two partitions. Every node in each 
partition has uniform parameters {ai,di,Pi) (Vi € {1,2}) 
uni. In 1221 , Ernvall et al calculated the capacity bounds 
of a heterogeneous DSS having dynamic repair bandwidth. 
The tradeoff curve is explored for non-homogeneous two rack 
model of DSS in 1i23i . In ll24l capacity bound is calculated for 
heterogeneous DSSs with dynamic repair bandwidth, where 

























Fig. 2. A file with size 4 units (= B) is divided into 11 encoded packets on 
field Fq. These packets are distributed among 6(= n) nodes in such a way 
that any data collector can download whole file by contacting at most 3(= k) 
nodes. In this heterogeneous DSS, a = (ai, 02 ) 03 i « 4 i “ 5 ) = (2 2 2 3 2). 
A failed node can be repaired by at most 2(= d) nodes. Functional and exact 
repairs ai'e shown for the node failure of U 5 with the help of surviving sets 
55 = {U 4 } and = {Ui, C/ 3 }. Surviving set Sg^^ is not considered in 

Tablej^since Sg^^ ^ ^g^^. 

node repair is done by some specific helper nodes. For the 
heterogeneous DSS, the storage node capacity depends on 
the repair bandwidth of each rack. In 1251 . tradeoff between 
system storage cost and system repair cost is investigated for 
heterogeneous DSSs with dynamic storage and repair cost. 

In this work, we consider a heterogeneous DSS where 
a file of size B is distributed among n nodes each with 
different storage capacities. File reconstruction is done in a 
flexible manner, where at any time instant t, data collector can 
reconstruct the file by connecting some kt number of nodes. 
Hence the reconstruction degree kt for a file is flexible with 
respect to time and the number of nodes. On the other hand, in 
case of a node failure Ui{l < i < n), it can be repaired at time 
t by downloading packets from Sp number of some nodes. 
Hence the repair degree cfp is also flexible with respect to time 
and the number of nodes. Repair of failed node can be done in 
two ways, exact repair and functional repair. If the recovered 
packets in repair process is exact copy of lost packets then 
it is called exact repair. On the other hand, if the recovered 
packets is some function of lost packets then the repair is 
functional repair. The model of such heterogeneous DSS is 
shown in Figure A data collector reconstructs a distributed 
file by connecting kt^ {j G {1,2}) number of nodes at time tj. 

In addition, a failed node Ui repairs by dp'^ number of some 
nodes at time tj. 

An example of such heterogeneous DSS is considered in 
Figure In this system, a file B is divided into 4 massage 
information packets xi, X2, X3 and X4. The massage infor¬ 
mation packets are encoded into 11 packets by taking linear 
combination of massage information packets as yi — xi, 
2/2 = X2, 2/3 = X3, y4 = Xi + X2, 2/5 = X4, 2/6 = *1 + X2, 
2/7 = Xi, 2/8 = X3, 2/9 = X2+X4, 2/10 = X2 and 2/11 = xi+a; 4 - 
The encoded packets ym(jn G [11]) are distributed on the 5 
nodes such that packets 2/1 and 2/2 are stored on node Ui, 


packets 2/3 and 2/4 are distributed on node U2, packets 2/5 and 
2/6 are on node U3, packets 2/7, 2/8 and 2/9 are on node U4 and 
remaining two packets are on node U3. Clearly the storage 
node capacity ai = 2{i G [5]\{4}) and 04 = 3. In this example, 
if node C/5 fails then it can be repaired by downloading packets 
2/7 and 2/9 from node U4. Since the recovered packets are 
function of lost packets so it is functional repair. On the other 
hand, node C/5 can be repaired exactly by downloading packets 
2/1, 2/2 and 2/5 from nodes Ui and U3 and solving 2/10 = 2/2 
and 2/11 =yi+ 2/5- 

Contribution: In this paper, we have calculated min-cut 
bound for the considered heterogeneous DSS. For such hetero¬ 
geneous DSS, we have established a bi-objective optimization 
linear programing problem subject to min-cut bound. The 
solutions of the LP problem are plotted as a tradeoff curve 
between system storage and repair cost. In a heterogeneous 
DSS, system storage cost and system repair cost are average 
costs to store and repair unit information data on a node respec¬ 
tively. We have plotted some tradeoff curve and compared it 
with tradeoff for heterogeneous DSS as considered in ||25]| and 
homogeneous DSS as investigated in ||71. Some specific cases 
are investigated for the established bi-objective optimization 
problem. 

Organization: The paper is organized as follows. Section 
2 collects the required preliminary concepts and describes 
our model. Section 3 investigates the min-cut bound for our 
model. Under the constraints of the min-cut bound, we also 
establish a bi-objective linear optimization problem to plot 
the tradeoff curve between storage and repair costs per node. 
Finally Section 4 concludes the paper with general remarks. 

II. Preliminaries 

In this paper, we focus our attention to heterogeneous DSS 
with parameters (n, fc, d), where file is distributed among n 
nodes, k = maxt{fct} is the maximum reconstruction degree 
for the file, di = mgiXt{dP} is the maximum repair degree 
for a node Ui at all time and d = maxi{di} is the maximum 
repair degree among all nodes at any time. For each time 
t, one can define a reconstruction set At as collections of 
the nodes having sufficient packets to reconstruct the file i.e. 
A = {Uti,Ut^,..., Ut^^ }. Clearly \At \ = h and intersection 
of any two reconstruction set may be non-empty. Define 32 / = 
{Ai,A 2 , . ■ •, At ,...} as a set of all reconstruction sets. Note 
that the set s/ will be finite if all reconstruction sets A G £/ 
are distinct. Hence 3 w € N such that \£/\ = uj. For the 
considered example in Figure 32 / = {Ai : 'ii G [7]}, where 
Ai = {Ui, U2, U3}, A2 = {Ui, t/3,175}, A3 = {Ui,U4}, 
A 4 = {U 2 ,U 4 }, A 5 = {C/ 2 , ( 75 }, As = { 1 / 3 , 174 } and A 7 = 
{C/4, C/5}. 

In the heterogeneous DSS, at time t, if a node Ui (i G 
[n]) fails then certain nodes called helper nodes, download 
required packets and generate a new node say C/'. The new 
node C/{ replaces the failed node Ui and the system is repaired. 
In particular, set of those helper nodes are called surviving set. 
For a node Ui, let the number of distinct surviving sets are 
Ti. At the time instant t, indexing the surviving set by i, one 

can denote them by = {Uj : some j G {1, 2,..., n}\{i}}, 
where i G [n] ED. If a node failure Ui repairs by nodes of 
surviving set S') ' then repair degree at the time instant t. 



















is Sp = Surviving sets for the heterogeneous DSS 

considered in Figure is listed in Table [I] In this example, 
one can see that if a node fails then it can be repaired 
by connecting nodes U 2 and U 3 or nodes U 2 and U 5 . Hence 
surviving sets for the node U 4 are and Sp\ In Tabled 
for a given i(i € [5]), 15^^ '\ is identical for all t € [t^]. In 
general, it may not be true. Also note that, in the table, we 
have chosen those surviving sets which are not the super set 
of other surviving set for the same node failure. In particular, 
the condition ensures the active participation of each node of 
an arbitrary surviving set during system repair process. 


TABLE I. Surviving sets for nodes in DSS as considered in 
Figure [21 


Nodes 

Ui 

Surviving sets 

# sets 

Ti 

Ui 

= {U2,Ui},Sff = {( 72 . c/ 6 }, 

Sp> = {Ui, Ui}, = {Ui, Ui}, = {Ui, Ui}. 

5 

U 2 

s'-ff = {Ui,Ui},Sf'’ = {U3,Ui},S‘i^ = {Ui,Ui}. 

3 

Ui 

={( 74 }.S.f> = {C/ 6 }. 

2 

Ui 

= {U2,Ui},Sf> = {C/2, c/s}. 

2 

Ui 

sF^TcaiTsF^Tc+F 

2 


In brief, for a failed node Ui, if system is repaired by nodes 
of specific surviving set S' then the number of information 
packets downloaded by node Uj € will be given by 
/3 (^Ui, Uj, SP'^ > 0. For example, in Figure 2 all two packets 
from node C /5 and packet = X 3 from node TT 2 is downloaded 
to repair node failure f/4. Note that C/ 2 , C /5 € sP. Hence 




Us.s: 


(2)A _ 


= 2 and 


/3(c/4,C/2, 


S 


(2)A _ 


= 1 . 


If a failed node Ui {i G [n]) is repaired by nodes of surviv¬ 
ing set sP then repair bandwidth (denoted by 7 (^Ui,SP'^) 
for the node Ui is the total number of packets downloaded by 
every nodes of the surviving set S^ . Mathematically 


l{u^.sP)= Y. p(u,,u„sP). ( 1 ) 

3 

such that C/jGSf' 

For example, in Figure if node C/5 fails and it is rmaired 
by nodes of surviving set sP (not considered in Table [ij since 
Sg^^ C sP) then 

7 (C/I!sf))=/3 (c/ 5 ,C/i,sf) +/? (c/ 5 ,C/ 3 ,sf) = 2 + 1 = 

3 units. 

Remark 1. In this paper, at time instant t, single node failure 
is considered because simultaneously multi-node failures can 
be assumed as a sequence of single node failure. 

Remark 2. One can find the tradeoff curve between repair 
cost and storage cost by optimization Problem \17\f or the exact 
or functional repair using the surviving sets as the collection 
of those helper nodes which repair failed nodes as exact or 
functional respectively. 

Remark 3. One can modify our heterogeneous DSS model by 
allowing some data collectors to reconstruct file separately at 
same time instant t with flexible reconstruction degree each. 
For the particular modified model, the tradeoff curve between 


At time t effective reconstruction degree for 



Fig. 3. Information flow graph Q = (V,£) for a heterogeneous DSS. The 
graph is divided into kt S {kt is flexible reconstruction degree at time t 
associated with data collector node “D”) step labels. Node Ux- (p/i E [n]) in 
heterogeneous DSS is represented by a pair of nodes Inx and Outx .. Sours 
nose “s” is in step label —1, data collector node “D” is in label kt-\-l. Step 
label 0 has n pairs of nodes Inx. Outx ■ Each step from label 1 to kt 
has one pair of nodes Inx^ and Outx - (pfj E [kt]). Mansion that node “D” 
(data collector node) at right upper comer is in step label kt + 1, in place of 
step label kt. 


repair and storage cost can be plotted using optimization 
Problem if any two data collectors are not connected with 
some common node. 

To plot the tradeoff curve between storage capacity a and 
repair bandwidth dj 3 in a homogeneous DSS, Wu et al 171 
solved an optimization problem with constraint of min-cut 
bound between the parameters. The bound is calculated by 
analyzing the information flow graph for the homogeneous 
DSS Q. In the similar manner, one can plot the trade off 
curve for our model. We consider the information flow graph 
(acyclic weighted directed graph Q = (V,£)) ll24l l25l for 
heterogeneous DSS as described in Figure 

For a heterogeneous DSS, at time t, the information flow 
graph Q as shown in Figure is divided into fct + 3 (kt being 
flexible reconstruction degree for data collector at time t) steps, 
starting from step label —1 to label fcj + 1. Step label — 1 
contains source node say “s” and step label + 1 contains 
data collector node say “D”. A typical node U\. (Vi G [n]) 
in heterogeneous DSS, is mapped to a pair of vertices “In\f 
and in V s.t. {Inx., Outx-) G S, where is permute 

index on nodes. Storage capacity axi of node Ux, is mapped to 
w{InXi,Outxi), where w{lnxi,0utxf) is weight associated 
with edge {Inx-, Outx,) G £. In graph Q as given in Figure!^ 
at step label 0, there are 2n number of vertices named Inx, and 
Outx, associated with nodes Ux, (i G [n]) in heterogeneous 
DSS.’ 

A failed node Ux, (i G [n]) in heterogeneous DSS, is 
repaired by generating new node Uf. The node Uf is mapped 
to a new pair of nodes Inf and Outf s.t. {In'^,, Out'^.) G S 
with w{In'^ ., Out '^.) = ax, ■ Every step label j G \kt\ contains 
one pair of nodes In\, and Out'y^ ,. As shown in Figure Isj in 
the heterogeneous DSS, system is repaired for the node ffflure 


















Effective reconstruction degree 



Step label -1 Step label 0 Step label 1 Step label 2 Step label 3 


Fig. 4. For a heterogeneous DSS as considered in Figure]^ a information 
flow graph is shown for a specifics data collector connects with the nodes 
of Ai={Ui,U 2 iU 3 }. The particular information flow graph is plotted for 
surviving sequence ^ G i(Ui,U 2 ,U 3 )). Mansion that 

node “D” at right upper comer is in step label 4, in place of step label 3. 

Ux- by downloading /3 {U\., amount of data from 

every node (e = {C7^^ : j S [^ 1 *]]}), where is 
some permutation on nodes. For the particular system repair, 
each downloading process maps by one distinct edge from 
some previous step label to step label j s.t. {Out '^^; ^ ^ 

with In'x.) = /3 In particular, if 

node Out'^^ does not exist then consider Out^^ from step 
label 0 s.f. {Out^.,In'^,) G £. In graph Q exactly one node 
failure is considered in each step label. 


2) (flow conservation constraint:) \/y € V\{s,f}, 

/((a:,y))= fi{y,z))- 

{x,v)es {v,z)eS 

For more details and example on flow function, cite ll26l 

Ezl. 

For a given information flow graph Q = (V,£), value of 
flow delivered to a data collector node say D is defined as total 
amount of flow passes through the edges (x, D) G £ for all 
possible X G V. For networks, maximum possible value of flow 
delivered to D is governed by min-cut max-flow theorem ll2^ 
l28l . Min-cut max-flow theorem says that across the network, 
maximum possible value of flow passes from source s to 
specific data collector D denoted by max-flow{s, D), is equal 
to minimum cut-capacity(s^ D), where 

min cut-capacity(s, D) = min {cut-capacity{X, . 

cut{X,'X)\ 

Note that cut{X, X) represents the set of all edges having one 
end vertex in set X and other vertex in set X such that remov¬ 
ing those all edges will improve the number of components in 
graph Q = {V,£). Here cut-capacity{X, X) is the sum of 
capacity of all edges in cut{X,X). At time t, for a specific 
data collector D which connects nodes Ux of set At G A, has 
iAi!nl=i' TXi number of distinct information flow graphs are 
exist. For every information flow graph Q = {V,£), D can 
recover the whole file B so 

B < mmmayi-flow{s,D). 

Q 

By min-cut max-flow theorem for an arbitrary data collector 
D one can compute. 


A data collector D connects kt number of nodes of At = 
{U{^, }. In information graph as in Figure 

a data collector D connects nodes (Vj G [kt ]) from step 

bel 1 to step label kt and downloads certain data file then 
{Out'^,,D) G £ such that w(Out'^,, D) -G oo. 

Example 4. At time instant t, for the heterogeneous DSS in 
Figure an example of information flow graph is shown in 
Figured In particular, a data collector is connected with the 
nodes of Ai = {Ui, U 2 , C/ 3 }. In the information flow graph, 
if the nodes are failed then it will be repaired by nodes of 
s[^\ 82 ^'^ and respectively. 

In Q EH EH EH, min-cut bound is calculated by analyz¬ 
ing flow passes through source node s to data collector node 
D across the information flow graph for a DSS. In the similar 
manner flow analysis is done for the model considered in this 
paper. Hence one can define flow across the information flow 
graph as follows. 

Definition 5. A function /:£—>■ [ 0 ,(X)) is called flow on a 
information flow graph Q = {V,£) if, 

1) (capacity constraint:) \/{x,y) G £, f((x,y)) < 
c((x,y)), where c((x,y)) = w(x,y) and c({x,y)) 
is capacity of edge (x,y). 


B < mmmmmax-flow{s,D). 

t Q 


In Q, for an information flow graph, flow analysis is done 
by taking topological order of failed node connected with 
data collector. In this paper, we are defining some sequences 
of nodes and corresponding surviving sets for our model to 
analyze flow. The definitions are as follows. 

Definition 6. A set of all possible sequences of nodes in a 
reconstruction set Aj G A is called reconstruction sequence 

set and denoted by £/{Aj) = |(t/Ai)l=i' ■ where 

{UxS^^i represents a sequence of distinct nodes of set Aj G 
A. Clearly \£/(Aj)\ = |4j|!. 


For example, in Figure = {(c/i, C/ 4 ), (C/4, C/i)} 

etc. 

Definition 7. For a reconstruction set Aj G A one can 
define sequences of surviving sets G [|./lj|], G [taJ) 

such that Uxi G Aj. Surviving sequence associated with node 

sequence (C/aJ'^}' G can be denoted by 

For example, in Figure a possible surviving sequence 
for the node sequence {Ui,Ui) is 






Definition 8. Set of all surviving sequences associated with a 
node sequence {Ucan be defined as follows. 





■ 3^ e [ta.] 


Clearly 


=^((c/a.)1±i')| = (nl=;^A.)!. 


For example, in Figure]^ one can see that y {{Ui, Uf)) = 
)) : 3£i e [ 5 ], 3£2 € [ 2 ]} etc. 

In ll25l . Quan et al have given tradeoff curve between 
system storage cost and system repair cost for heterogeneous 
DSS with uniform reconstruction degree. Similarly one can 
give tradeoff curve between system storage cost and system 
repair cost for heterogeneous DSS model considered in our 
paper. For our model, we define system storage cost, node 
storage cost and system repair cost as follows. 

Definition 9. (System storage cost): Total amount of cost 
Cs{ot) to store unit data in heterogeneous DSS(n,k,d) is 
called system storage cost, where storage amount vector a = 
(ai, 02 , ■ • ■, cun), storage cost vector s = (si, S 2 , ■ ■ ■, Sn), cti 
is storage capacity of node Ui and Si is the cost to store unit 
information data in node Ui (V* € [n]). Clearly 

1 " 

Cs{a) = 


System storage cost C's(a) for the example considered in 
Figure 1^ with s = (100,10,10,10,1) is 68 cost units. 

Definition 10. (Node repair cost): The average amount of cost 
to repair a node Ufii € [n]) in heterogeneous DSS(n,k,d) is 
called node repair cost associated with repair cost vector 
r= (ri,r 2 , •. ■ ,r„) s.t. 

= E (d 

® 1=1 j 

where rj is cost to download unit amount of data from node 
Uj during repair process. Clearly node repair vector r(/3) = 
(r(/3i),r(^2 ),---,r(/3 „)). 


In the example considered in Figure if r = 
( 10 , 1 , 1 , 1 , 1 ) then node repair cost vector r{P) = 
(r(/3i),r(^2),r(/33),r(/34),r(/35)) = (^ |, j)- 

Definition 11. (System repair cost): System repair cost is 
total amount of cost to repair all nodes in heterogeneous 
DSS(n,k,d) and denoted by Mathematically 

n 

Cr 0 ) = Y.r{fi,). 

1=1 

Clearly Cr0) = j§ cost unit for r = (10,1,1,1,1) in the 
example considered in Figure 


III. Results 


For our model, it is shown that minimum possible value 
of flexible reconstruction degree is lower bound of cardinality 
of any cut set which separates source node and data collector 
node. For the heterogeneous DSS min cut bound is calculated 
in Theorem flJl Using that min cut bound, it is shown that 


file size should be lower bound of min cut bound for the 
heterogeneous DSS. Using the particular bound as constraint, a 
bi-objective optimization linear programing problem is formu¬ 
lated to minimize system storage cost and system repair cost 
for the considered heterogeneous model. A family of solutions 
is calculated for the optimization problem by substituting 
some numerical values of system parameters. The numerical 
parameter is plotted the tradeoff curve between system storage 
cost and system repair cost. The curve is compared with 
tradeoff curve for homogeneous DSS Q and tradeoff curve 
for heterogeneous DSS j^ . 

Lemma 12. An arbitrary information flow graph Q — (V,f) 
with source node s G X, flexible reconstruction degree kt 
associated with data collector node D G X has 

min {|cut(A’, A’)| : cut{X,X) ^ fi) > min{fct}, 

A’cZ V i 

where X U X = V. 


Proof: Consider a heterogeneous DSS associated with 
some information flow graphs. For any arbitrary information 
flow graph Q = (y,8), 3 X G V such that s G X, 
D G X. Since information flow graph is connected graph 
so cut{X,X) ^ (p for any nonempty set X and X. To 
retrieve the distributed file for the case X = {D}, one has to 
connect at least mint{fct} number of nodes among n nodes. 
Hence each node of an arbitrary set of mint{fct} number of 
nodes, has some encoded data of distinct part of massage data. 
So there are at least mint{fc(} number of edges having end 
point as node D. Mathematically, \cut{X,X)\ = mint{fct} 
for A” = {D}. Again, if |A’| > 1 then some edges in 
cut{X,X) represents downloading process for system repair. 
In particular, a node failure among the mint{/ct} number of the 
specific nodes, can not be repaired by the some subset of the 
remaining min({fc(} — 1 number of the nodes. Reason behind 
that, each node in the set of mint{fct} number of nodes has 
encoded data packets of some unique message data packets. 
Hence, there must exist some helper nodes other then the 
mint{fct} — 1 number of nodes for the repair the failed node. 
So \cut{X,X)\ > mintjA:*}. But X is any arbitrary nonempty 
subset such that cut{X,X) exist, so \cut(^X, X)\ > mint{fct} 
for all possible cut{X,X) ^ fi. This proves the lemma. ■ 


In a heterogeneous DSS, information delivered to data 
collector D depends on mmcut-capacity(s, D). The Theorem 


13 gives the lower bound of min cut-capacity {s, D). 


Theorem 13. (min-cut bound) For a given heterogeneous DSS 
with an arbitrary data collector D associated with flexible re¬ 
construction degree kt, the min cut-capacity {s, D) is bounded 
below by cS as given by Equation (0), z.e. 


Now we give some results and analysis for the min-cut min cut-capacity {s, D) > (3) 

bound for the model of heterogeneous DSS considered in this 
paper in the next section. 







Proof: Consider a heterogeneous DSS (n, fc, d) associated 
with some information flow graphs. Every information flow 
graph Q = (V,£) has a source node s, a data collector node 
D associated with effective reconstruction degree kt- In the 
heterogeneous DSS, a failed node Ui can be repaired by nodes 
of some surviving set sf'\ where I G [t^]. 

Let A' C V, A’ U = V, s G_A’ and £> G ^ 
such that some nonempty subset cut{X,X) C £ exist. Now 
if A" = V\{D} then cut-capacity{X,X) oo. Simi¬ 

larly if A' = {s} then again cut-capacity{X, X) —>■ oo. 
Hence min cut-capacity {X, X) would be obtained by all those 
Out'j G X and Irii G X since it will give a finite cut- 
capacity {X, X), where i G [n] and j G [kt\. 


Information flow graph Q = (V,£) is directed acyclic 
graph so it can be represented in a topological order of its 
vertices. For the topological order, sequences of node failure 
and corresponding sequence of surviving sets are arranged 
by using definitions as given in previous section. For that 
assume at time t, data collector D connects with all nodes 
of a set At G A and reconstruct the file B. A{At) is the set 
of all possible sequences of nodes of At G A. A sequence 
{U>.i)iLi € represents the order of nodes failure of 

specific set At- Recall the set of all possible surviving se¬ 
quences associated with a node sequence 


For a specific node sequence {Uxf)\f-^ with a specific 


surviving sequence 




one can analyze the following 


For Out\ G X associated with the first node in node 

Al ^ 

sequence , the following two cases are possible. 


• If G X then edge {In'^^,Out'^^) G cut(X,X). 
Hence a^i will contribute in cut-capacity {X, X). 

• If G X then edges {Out^-,In'x^) G cut(X,X), 

where G and 4? ^ 

any I G [taJ. Hence this case contribute in cut- 

capacity(X, X) by 


So contribution in min cut-capacity{X, X) supported by node 
I/ai is 


min 


aA„ E 


''j 

u.,es' 


(t) 


If a node t/p(Vp G [n]) fails in the system then all nodes 
of some surviving set s'^'^ will generate a new node Up with 
same characteristic. At a time instant t, one of them is in the 
system. Hence for the remaining part of the proof, we are 
writing Up in place of Up. 


For the remaing part of the proof we have used the notation 
C/p(Vp G [n]) in place of Up since characteristics of both nodes 
Up and Up are same and one of them appears at instant. 

In general to compute contribution in min cut- 
capacity {X, X) supported by node Ux- G {U\i)\Li 
assume Out'^. G X. Again following two cases are possible. 

• If /n^. G X, then edge {In'^ ,Ouf^,) G cut{X,X). 
Hence ax^ will contribute in cut-capacity{X^ X). 

• If In'^, G X then all possible edges {Outp-,I'<^'x ) 
s.t. Up- G S^^^\{Uxi,Ux 2 , ■ ■ ■ ,Uxi_i} associated 

G for any £ G [taJ, will contribute 

in cut{X,X). Edges {Outxj,In'x-) associated with 
node Up- G S^^^\{Uxi,Ux 2 , ■ ■ ■ ,Uxi_J newly 
investigated from step label 0 for cut{X,X). Edges 
[Out'^ A'n'x ) must be excluded because they have 
investigated earlier at step label m, where m G [* — 11 
s.t. Ux^ G Sf . Hence this case contribute in cut- 
capacity {X, X) by 

E l3[Ux^,Up^,sf^). 

^3 

So contribution in min cMf-capacifj/(A’, A”) by node Ux^ is 


Q^Ai> 


E 


P{Ux,,U, 


s 


{ef 


u^^esfA{Ux,,Ux^,.-,u>.._u 


At the time instant t, if data collector D connects with each 
nodes Uxi G At, i G [fc*] then for a specific node se¬ 
quence {Uxi)tLi associated with a specific surviving sequence 

S\') the contribution in min cut-capacity {X, X) is 
* / 1—1 


kt 

Emin 
2 = 1 




E 


fi(Ux„ 


U, 


^3 ’ 


U^^(iSf'>\{Ux^,Ux^,...,Ux^_A 



Now one can find min cut-capacity {s, D) for a specific D 
by taking minimum among all possible cut-capacity{X, X) 
which is calculated for all possible node sequences {Uxfi^Li 

among all possible associated surviving sequences I \ 

\ W 2 = 1 

The mmcut-capacity{s, D) is the minimum value of the par¬ 
ticular mincut-capacity{s, D) calculated for all possible spe¬ 
cific D. For a given heterogeneous DSS, associated with any 
arbitrary data collector D one can find min cut-capacity {s, D) 
as Inequality ([^ by using Equation Q. The particular Equa¬ 
tion © holds because index of storage node capacity is 
governed by index Xi of nodes in node sequence {Uxi)^Li. 


One can easily observe that the bound is tight since the 
min-cut bound is calculated by taking the minimum value of 
all possible cut bounds. Hence one can say the following. 





min min > mm < , 


5] /3(t/.,,C/,^,^lf) 


Ux, e^t 


u^.^es[‘\{Ux^,-,Ux,_,} 


(4) 


mm < ax., 


mm 

V fct 




Ux.eAt 


u^,esi^\{Ux,,-,Ux,_,} 


The min-cut bound is calculated for all possible node 
sequences {Uxi)\Li associated with all possible surviving 

sequences . Hence there exist at least one surviving 

fet 


sequences, say, ( S 


Ar 


i=l 


associated with node sequence. 


say, for which the inequality holds with equality i.e. 

the min-cuf bound Inequality Q is tight. ■ 


Remark 14. For a given heterogeneous DSS, at time t, if 
an arbitrary data collector connects each node U\. in subset 
At € A then total number of possible information flow graphs 
are given by 


E 


|.4d 

iamH 

i=i 




In particular, for a specific information flow graph, the total 
number of computational comparisons are 2 At\_ Hence One 
can say that the time complexity to calculate min-cut bound 
is 


O 


[e [ 21 - 4 ( 1 ^, 

V^fG.4 A / 


By Theorem 13 one can calculate the minimum require¬ 
ment of storage node capacity and repair bandwidth to store 
a file with size B. In other words the upper bound of stored 
file with size B is given by the following lemma. 


Lemma 15. If a file with size B is stored in a given hetero¬ 
geneous DSS (n, fc, d) then 


+ min{a2,^(l72,C/4,4'^)} + 

min |a 3 , /? (jJs, t/ 4 , |=2-fl + 2 = 5 units. 

Now one can frame a optimization problem to find min¬ 
imum system storage cost and system repair cost under the 
constraint that the maximum possible information deliver to 
data collector node D is at lest B. 

Problem 17. 


Minimize: [Ca{a),Cr{P)] 
subject to 

Inequality 

Oil > 0; 

> 0 ; 


where i G [n], i G [r^] and Uj G for some j G [n]\{i}. 


Optimum values for the both objective functions of bi¬ 
objective optimization Problem 17 are plotted as tradeoff curve 


between C's(cS) and Cr{(3). In this paper the optimization 
Problem 17 is solved by weighted sum method for some 
numeric example. 


Some specific cases for optimization Problem 17 
lyzed in the following subsection. 


are ana- 


A. Some Specific Cases 


B<^, ( 6 ) 

where^ is given in Equation (0) and remaining used notations 
have common meaning as defined in previous sections. 

Proof: Any arbitrary data collector node D must be able 
to reconstruct the whole file with size B. Hence maximum 
information flow value delivered to any data collector, should 
be at least B. Now using min-cut max-flow theorem and 
Theorem [T^ one can prove the lemma. ■ 

Example 16. The ram. cut-capacity {s,D) for the 
information flow graph as shown in Figure will 

be min{ai,/3(l7i,[/2,5^'^) +/3 (Ui, 1 / 4 , 5^) } 


Considered heterogeneous DSS can be reduced to follow¬ 
ing cases under some specific restrictions. The cases are as 
follow: 


1) (Uniform Reconstruction): At time t, if an arbitrary data 
collector can retrieve the file by downloading data from exactly 
k nodes for any combination out of n nodes then the constraint 
Inequality (j^ for the optimization Problem 17 has additional 
property kt = k, \/t. 


2) (Uniform Repair Degree): For a heterogeneous DSS let 
a node failure can repair by any d nodes out of remaining 
n — 1 nodes. Under the particular assumption the constraint 
Inequality ^ for the optimization Problem reduced to 








^ = min 




E 

U A ■ ^A-t 


mm < a\. 




E 


/3(c/ao 


u 


(5) 


-'Mj 


GS''?\{(7ai 


.■■■.C/Xi_i} 


Problem 18. 

Minimize: [C's(a), C'r(/3)] 
subject to 

B < min min < a \^, 

0 < ai < q;2 < ... < q;„; 

1 < Ai < A 2 < ... < Afe, < n; 




where index fj,j is the index of node € 

'S'a?\{^Ai, ■ ■ • such that {U\^,... C 

Vi G [kt] and some j G [d]. 


In this case IS"! 


Wi 


= d. 


= {--/),ym G N. 


Here min cut-capacity (s, D) will be given by the node se¬ 
quence {UxAiLi G Al(Alt) associated with surviving sequence 


) such that a\^ < ax^ < 


{Ux„Uxl 


< otx. 


and 


.caa._jc5: 


(«) 


3) (Uniform Repair Download Amount); In this case we 
assume that downloaded amount from any arbitrary helper 
node to repair the system is constant say /3. Hence optimization 
Problem 17 under the restriction has additional properties 
as j3 = /3, /3 > 0 (Vi S [n], all possible 

i G H\{i} and Vf G [xi]). 


4) (Homogenous DSS); In heterogeneous DSS become a 
homogeneous DSS if characteristics of parameters are uniform. 
Hence assume effective reconstruction degree for any data 
collector is k and storage capacity of each node is a. In 
addition let a node failure can repair by any d nodes out 
of remaining n — I nodes by downloading (3 packets from 
each helper node. Under these restrictions, the constraint 
Inequalities (|^ for the optimization Problem 

Problem 19. 

Minimize: [C's(a), C'r(/3)] 
subject to 

k 

B < min {a, (d — i — 1) /?} ; 

i=l 

a > 0; 

^ > 0. 


17 reduced to 


5) (Other); In this paper, the considered heterogeneous 
DSS model can be reduced into some more specihc DSS 
by applying some appropriate restrictions on constraints. For 
example, heterogeneous DSS with uniform reconstruction and 
uniform repair degree (case 1 and 2 respectively) collectively 
reduces to heterogeneous DSS as investigated in ||25]| . 


One can easily hnd solution of the bi-objective optimization 
Problem ( [T7| for some numerical values and plot the solution 
as tradeoff curve for the same. One can compare the tradeoff 
curve with the tradeoff curve for the existing heterogeneous 
DSS investigated in ll25l . Hence in the next section we are 
calculating some optimum solutions for numerical parameter 
for our model and comparing it with homogeneous model Cl 
and heterogeneous model ll25l . 

B. Numerical Work 

For the optimization Problem ( [TT] ), LP problems with single 
objective function is solved. The single objective function is 
calculated by taking linear combination of the two objective 
functions of optimization problem ( [T7| ). Ten such LP problems 
are solved by taking distinct linear combination factor between 
10“^ and 10^. Plotting tradeoff and solving LP problems are 
done with the help of ‘MATLAB’ and ‘Ip solve’ 123. 

In Figure four tradeoff curves are plotted between 
system repair cost Cr and system storage cost Cs for the 
respective DSSs. In particular Figure one curve is plotted 
for homogeneous DSS as investigated in Qj another one is 
drown for a heterogeneous DSS as investigated in ESll and 
remaining two curves are plotted for two heterogeneous DSSs 
as studied in this paper. In particular, one of the remaining 
two curves has minimum effective reconstruction degree fcmin 
is 2 and other has maximum effective reconstruction degree 
A:,„ax is 2. For all considered DSSs the common parameters 
are as follow; n = 4, 5 = 1 unit, ^ = (1 10 10 100) and 
r = (10 1 1 1). For homogeneous DSS and heterogeneous 
DSS studied in ll25l have reconstruction degree k = 2 and 
repair degree d = 3. Remaining both heterogeneous DSS have 
surviving sets = {t/ 2 , C/ 3 , C/ 4 }, 4^^ = {Ui,U 4 }, = 

{C/i,C/ 2 }, = {C/ 2 , C/ 3 }. 

In Figure one can see that our heterogeneous DSS 
model has more optimum system storage and repair cost 
then the homogeneous DSS studied in Q. Although the 
characteristics of our heterogeneous model and heterogeneous 
model investigated in 1251 are different, but we obtained some 
more optimum points for our model as in Figure]^ It is shown 
in the last subsection that one can find heterogeneous DSS 
considered in l25l by taking some restrictions on our model. 

Remark 20. In the particular tradeoff curves, non-integer 
solution of bi-objective optimization problem is also con¬ 
sidered. Since the scaling of an arbitrary file size B to 1, leads 
to respective integer solution that is not necessarily scale to 
some integer. 


IV. Conclusion 

In this paper, we proposed a model of heterogeneous DSS 
with dynamic reconstruction degree, storage node capacity 
and repair bandwidth. In particular, at time t, a hie can be 





Fig. 5. For various DSSs the optimal tradeoff curve is plotted between system 
repair cost Cr and system storage cost Cs- 

reconstructed using certain set of nodes and system is repaired 
for any failed node by contacting some set of helper nodes. 
For such heterogeneous DSS, the fundamental tradeoff curve 
between system repair cost and system storage cost is inves¬ 
tigated. To plot the tradeoff curve, a bi-objective optimization 
problem is formulated with the constraints of min-cut bound 
and non-negative parameters of the heterogeneous DSS. The 
bi-objective optimization problem is solved by weighted sum 
method for some numerical values of parameters of the het¬ 
erogeneous model. Analyzing the tradeoff curve, we observed 
some more optimum points then the existing heterogeneous 
model ll25]l . The considered model is close to real world 
scenario. Our heterogeneous model is flexible enough to mold 
it into any existing heterogeneous or homogeneous DSS by 
considering appropriate restrictions. It would be interesting to 
construct codes achieving the optimum points on the tradeoff 
curve. 
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